from profiles import train, X_train, Y_train, X_valid, Y_valid, column_names, treatment_col
from model import train_xgb_model, train_logistic, evaluate_uplift, simple_network, check_acc_diff, check_uplift_diff, local_search_xgb
from explanations import pdp_plot_uplift, ale_plot_uplift
train.head()
This is an uplift modelling dataset with indicator variable TREATMENT coded as segment here, orignally consisting of 3 types - no communication, mens and womans communicator type. Here, for simplicity the latter two are merged into one (because we already have information about gender in other column). Columns represent (after feature eng):
It's easy to see, that above variables are surely highly corelated (history > 0 <-> newbie etc.). Some other variables were therefore removed.
r_xgb_model = local_search_xgb(X_train, Y_train, X_valid, Y_valid, treatment_col, just_get_model=True)
check_acc_diff(r_xgb_model, "Robust XGBoost", X_train, Y_train, X_valid, Y_valid)
check_uplift_diff(r_xgb_model, "Robust XGBoost", X_train, Y_train, X_valid, Y_valid, treatment_col)
This is model found by hyper-parameters local search: Approximate local derivative of each hyper-parameter by sampling close values, cross-validate model with them and then go ascending way, loop over all parameters for some iterations until no change appears what means we hit local minimum. Models found by this method are highly robust, and as we see the scores on train and valid datasets are close (actually, here because of some random fluctuations, valid have better score). However, this cross-validation did not use accuracy or loss as metric, but had to use adjusted gains, calculated as gain of current model comparing to random model, normalized by maximum possible achievable score (visible on above plots respectively for train and valid dataset). This parameter can be also interpreted as a relative monetary profit from using this algorithm instead of random one.
pdp_plot_uplift(r_xgb_model, X_train, Y_train, treatment_col)
ale_plot_uplift(r_xgb_model, X_train, Y_train, treatment_col)
Unfortunately, we were not able to make names on plots work, but these ids correspond to those in the dataset description.
The first thing to comment should be that ALE plots are really close to PDP profiles, so model does not have (major) interactions. However this does not mean, that we can use some classifier without interactions, because here we model directly the uplift, which is difference between preditions and is itself an interaction.
Also worth noting is, that plot 11 represents Treatment, and since we substitute this variable in prediction process, explanation function cannot see any effect of tweaking this value independently (either way we just ignore it).
Sum of previously bought products - History (1), as most distinctive variable makes the model prone to overfit. Its high fluctuations also reaffirm this thesis. Intuitively, function should ascend, stairs are probably due to tree-like nature of the classifier.
Days since last purchase (1). Plot provides some vague flat sinusoidal fluctuations with global maximum beeing first of local maxima. Intuitively, function should be concave, aiming to find some "sweet-spot" between time when customer "just went out from the shop" and "forgot about this". This plot can be interpretted in this manner, but also with some noise at bigger values, where classifier is littlebit overfitted to remember distinctive cases. Also, this plot is a valuable asset and can be used for choosing best time to run campaign for this customer.
nn_model = simple_network(X_train, Y_train, X_valid, Y_valid)
check_acc_diff(nn_model, "Simple neural network", X_train, Y_train, X_valid, Y_valid)
check_uplift_diff(nn_model, "Simple neural network", X_train, Y_train, X_valid, Y_valid, treatment_col)
This is overfitted small neural network model (relu, 3 layers with channels of size 50, batchnorm, learned with early stopping based on validation dataset loss), completly unable to learn robust features.
pdp_plot_uplift(nn_model, X_train, Y_train, treatment_col)
ale_plot_uplift(nn_model, X_train, Y_train, treatment_col)
Neural network model also do not seem to have any major interactions, also we got to keep in mind that this model have berely positive score on valid dataset. First thing that comes up to mind after looking at these plots is that the estimated value is different, however for our uplift score this should not be a concern -> we only sort observations in order basing on the estimations, so only relative probabilities are important.
Patterns visible at previous model are repeated except dwelling-place. This model suggest that capaigns are less persuasive for people from more rural areas.
Counter-intuitively, this function is convex instead of beeing concave -> major difference between models here.
This model suggests, that the more client spends, the less important is campaign for him. This can be caused be reasoning that they are "sure buyers".